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CHANGE  ANALYSIS 


by  Emanuel  Parzen 

Department  of  Statistics,  Texas  A&M  University 
College  Station,  Texas  77843-3143 


Abstract 

Change  Analysis  “in  the  strict  sense”  is  concerned  with  the  problem  of  de¬ 
tecting  and  estimating  slow  and  abrupt  changes  in  the  probability  distributions  of 
successive  observations  Y(t)  of  a  variable  or  system.  This  paper  hzis  two  goals  (1) 
introduce  an  approach  to  Change  problems  by  introducing  anjJysis  of  Score  Change 
Processes  (whose  idea  is  to  study  if  a  model  to  a  whole  data  set  fails  to  fit  it  by 
“random  walking”  the  parameter  estimating  equations);  (2)  develop  analogies  be¬ 
tween  four  basic  statistics  problems,  corresponding  to  the  standard  assmnptions 
made  about  a  sequence  of  observations  Y{t),  t  =  1,. . .  ,n;  test  the  hypothesis:  A: 
Distribution  of  specified  parametric  form,  B:  Independence,  C:  Identical  distribu¬ 
tion,  For  a  sequence  of  bivariate  observations  X((t),  Y(t))  one  would  like  to  test 
D:  Independence  of  X  and  Y.  Contents  are:  Introduction,  Change  analysis  in  the 
strict  sense  (test  Assumption  C),  Goodness  of  fit  (test  Assumption  A),  Spectral 
Analysis  (test  Assumption  B),  Four  phases  of  change  analysis.  Parametric  scores 
change  analysis,  Nonparametric  scores  change  analysis. 

1.  Introduction 

Data  F(l), . . .  ,y(n)  which  can  be  regarded  as  continuous  random  variables 
observed  sequentially  can  be  called  indexed  data  or  a  time  series.  Classic  statistical 
inference  m^es  three  basic  assumptions: 

Assumption  A.  Probability  law  of  each  Y  has  probability  density  belonging  to 

a  known  parametric  family  of  probability  densities  /(y;  9). 

Assumption  B.  Random  veiriables  K(l), . . .  ,Y{n)  are  independent. 

Assumption  C.  Random  variables  y'(l), . . . ,  y(n)  are  identically  distributed. 

Methods  for  detecting  (and  estimating)  the  fit  (and  the  nature  of  violations) 
of  these  assumptions  in  our  opinion  can  be  respectively  related  to  three  parallel 
theories: 

Theory  C.  Changepoint  analysis  or  change  analysis  (in  the  strict  sense). 

Theory  B.  Spectral  analysis  (time  series  analysis  in  the  frquency  domain). 

Theory  A.  Goodness  of  fit. 

We  believe  that  one  can  define  a  theory,  called  Comparison  Change  Analysis, 
which  is  intended  to  study  analogies  between  theories  A,B,C  (and  bring  the  insights 
of  the  theories  that  are  more  developed,  such  as  spectral  analysis,  to  less  developed 
ones).  General  accoxmts  of  this  theory  are  given  in  Parzen  (1992),  (1991). 

The  aissumption  that  the  data  is  observed  sequentially,  which  may  seem  to  limit 
the  applicability  of  Change  Analysis,  is  dropped  when  the  analogies  are  extended 
to  the  bivariate  data  analysis  problem  which  considers  independent  bivariate  data 
{X{t),  Y{t)),  <  =  1, . . .  ,n,  and  desires  to  model  the  relation  between  X  and  Y  and 
in  paiticiilar  to  test 

Assumption  D.  X  and  Y  are  independent  random  variables. 

A  generzil  non-pzirametric  theory  of  testing  eissumption  D  can  be  related  to 
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Theory  D.  Change  analysis  (random  effect). 

Analogies  between  theories  A  to  D  are  obtmned  from  the  facts  that  in  each 
problem  the  first  step  in  analysis  is  to  define  a  dynamic  statistic  which  is  a  function 
on  the  unit  interval  [0,1]  whose  asymptotic  distribution  (under  the  null  hypothesis 
that  the  assumptions  axe  true)  is  either  a  Brownian  Bridge  or  a  related  process.  The 
test  statistics  in  each  theory  are  analogous  to  the  nonparametric  test  statistics  that 
statisticians  have  developed  to  test  goodness  of  fit  for  equality  of  two  distributions. 

Textbooks  imply  it  is  difficult  to  choose  among  the  many  test  statistics  for 
goodness  of  fit  and  analogous  testing  problems;  we  believe  we  should  be  optimistic 
about  our  ultimate  ability  to  develop  procedures  for  adaptively  choosing  appropriate 
test  statistics  which  not  only  test  the  null  hypothesis  but  also  suggest  likely  models 
instead  of  only  rejecting  the  null  hypothesis. 

2.  Change  analysis  in  the  strict  sense  (test  Assumption  C) 

The  theory  of  change  analysis  in  the  strict  sense  considers  data  Y (<),  t  =  1, . . . ,  n 
which  represents  a  transformation  of  observed  data  (the  identity  transformation 
leaves  the  data  unchanged). 

Let  Y~  denote  the  sample  mean  (an  estimator  of  the  true  mean  p  if  the  data 
axe  identically  distributed).  Let  ay~  denote  a  suitable  estimator  (such  as  the  sam¬ 
ple  standard  deviation)  of  the  true  standard  deviation  ay  of  the  data  under  the 
zissumption  of  identical  distribution  (which  is  asstuned  to  be  finite). 

The  data  Y  is  transformed  to  normalized  data 


K-(()  =  (K(()  -  n/<’y‘- 


We  plot  the  normalized  data  as  a  sample  change  density  c^lr),  0  <  r  <  1,  defined 
to  be  a  piecewise  constant  function  whose  vjilue  is  equ^  to  y~(j)  on  the  interval 

(j  —  l)/n  <  r  <  j/n,  for  j  =  1, . . .  ,n.  Note  that  Jq  c'(r)dr  =  0,  Jq  c~^(r)dr  =c  1. 

CUSUMS  (cumulative  svuns)  are  becoming  increasingly  important  diagnostic 
tools  to  look  for  patterns  in  indexed  data.  They  axe  related  to  the  sample  change 
process  on  0  <  r  <  1 

C-(r)=  /  c{t)dt. 

Jo 

The  points  r  =  j/n  for  j  =  1, . . . ,  n  are  called  “exact”  values  of  r;  at  these  points 
C(t)  equals  a  cumulative  sum: 


j 


C-UM  =  (l/n)  Y-(k)  =  ij/n)Yf. 


ib=l 


To  understand  why  the  change  process  is  zm  effective  means  of  detecting  change 
in  the  data  consider  its  behavior  under  two  models  for  Y{.). 

If  y(.)  is  deterministic  and  linear,  say  Y{t)  =  t,  then  at  exact  t  =  j/n  approx¬ 
imately 

c‘(t)  =  y(j)  =  12®(t-  .5), 

C-(t)  =  (-.5)12  V(1  -  r). 

The  graph  of  C~(r)  when  y’(.)  is  linear:  is  a  parabola  that  goes  from  0  to  0  with 
mimimum  value  at  r  =  .5. 
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If  F(.)  is  random  (independent  identically  distributed),  the  stochastic  process 
C“(r),  0  <  r  <  1,  can  be  shown  to  be  asymptotically  distributed  (as  the  sample  size 
n  tends  to  infinity)  as  a  Brownian  Bridge  stochastic  process  B{t),  0  <  r  <  1,  which 
is  a  zero  mean  Gaussian  process  with  coveudance  kernel  E[B{s)B(t)]  =  mm(s,  t)—3t. 
Note  that  B(0)  =  J5(l)  =  0,  and 

Variance[B(r)]  =  r(l  —  r). 

To  test  for  departmes  from  Assumption  C  (identical  distribution)  one  tests  if  the 
observed  change  process  C~(t)  is  significantly  different  from  a  sample  curve  of  a 
Brownian  Bridge  which  can  be  expected  to  be  a  wiggly  (non-smooth)  curve  oscil¬ 
lating  about  the  horizontal  axis. 

A  related  process  that  plays  a  central  role  in  change  analysis  is  the  Change 
Test  Process 

cr(T)  =  (r(r)/(r(l  -  t))-5. 

The  fundamental  role  of  the  change  test  process  starts  with  the  fact  that  for 
fixed  r  =  j/n,  CT"{t)  can  be  shown  to  be  a  monotone  transformation  of  the 
classic  two-sample  Student’s  t-test  statistic  of  the  null  hypothesis  fii  =  io.  the 
model  y(l), . . .  ,Y{j)  is  Normal(/xi,<T^)  and  Y{j  -f- 1), . . .  ,Y{n)  is  Normal(^2»<^^)* 
The  S2Lmple  means  and  variances  of  the  two  samples  y’(l), . . . ,  y’(j)  and  Y(j  + 

l),...,y(n)  are  respectively  denoted  /ii', and  The  pooled  sample 

variance  is 

5^2  =  r5f2^(l-r)52‘2. 

One  can  verify  that 


-h  (r(l  -  T)){fii  -  H2'f, 
/if  -  y-  =  (1  -  r)(/if  -  /i2") 


The  classic  two-sample  Student’s  t-test  statistic  is  T,  defining 

T  =  (,r{l  -  T))  ^^  -  nVS-. 


Define  R,  a  “correlation  version”  of  T,  by 


=  T^/(i  +  =  i27(l  - 

=  t(1  -  r)(/if  -  fi2'f/^Y 


<2  _  p2  ^ 


Then 


33 Ion  For 
IRAH 

.need 

:atlon- 


■.  at  1  on/ _ 

.lability  Codes 


and  one  concludes  that  CT~(t)  is,  Uke  R,  a  corrdation  type  statistic  since  'Avail  and/or 

•Special 

R^  =  (TH,\-T)){n-Y-flcrl  1 

=  |cr{r)|2. 


We  can  consequently  express  Student’s  <-test  statistic  T  as  &  monotone  function  of 
CT~(t)  since  T  =  R/{1  -  R’^)-^. 

Let  t"  denote  the  value  among  the  exact  values  t  =  j/n  (for  j  =  1, . . . ,  n  —  1)  at 
which  the  absolute  value  of  CT~{'^  achieves  its  maximiun.  Under  the  assumption  of 
at  most  one  change  in  the  distribution  of  y(.),  CT~(r")  is  a  test  statistic  for  change 
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and  its  time  of  occurence  is  consistently  estimated  by  t''  (a  result  established  by 
Carlstein  (1988)). 

3.  Goodness  of  fit  (Test  Assumption  A) 

One  of  the  most  extensive  and  least  applied  branches  of  statistical  theory  is 
the  theory  of  goodness  of  fit  of  probability  models  to  observed  data.  Despite  its 
importance  (both  for  theory  and  practice)  it  appears  to  be  sparsely  taught  to  grad¬ 
uate  students  in  statistics.  The  chi-squared  goodness  of  fit  test  introduced  by  Karl 
Peeuson  in  1900  is  regarded  as  one  of  the  top  20  achievements  in  modem  science. 
How  can  one  explain  the  neglect  of  instruction  in  its  theory?  One  explanation  may 
be  that  its  theory  is  often  taught  rigorously  as  a  study  in  pure  probability  theory 
rather  than  developed  vigorously  for  its  statistical  interpretation. 

Let  Y{t)i  <  =  1, . . . ,  n,  be  a  random  sample  of  a  continuous  random  variable  with 
true  distribution  F(y)  =  F(y;0Q)  belonging  to  a  finite  parametric  family  F{y,6). 
The  true  quantile  function  is  F~^(u;^o)5  0  <  u  <  1.  The  sample  distribution 
function  is  denoted 

F~(y)  =  fraction  of  sample  <  y. 

Let  6"  denote  the  maximiun  likelihood  estimator  of  6.  Stochastic  processes  whose 
asymptotic  properties  are  of  interest  (for  both  theory  and  practice)  are 

F-{y)-Fiy,eo), 

ny)-ny.n 

F(y,r)-F(y,eo), 

eveduated  at  y  =  F“^(u;  ^q)?  0  <  u  <  1.  We  denote  such  a  process  C'“'(«),  0  <  u  <  1, 
to  emphasize  its  analogy  to  a  sample  change  process.  We  use  fimctions  of  u  to  study 
changes  of  distribution,  and  functions  of  r  to  study  changes  of  models  fitting  data. 

The  testing  and  estimation  procedures  of  goodness  of  fit  theory  can  be  organized 
into  four  phases  summarized  (in  section  5)  in  our  discussion  of  the  foiur  phases  of 
change  zinalysis. 

4.  Spectral  Analysis  (Test  Assumption  B) 

One  approach  to  testing  the  assumption  of  independence  is  to  consider  as  an  al¬ 
ternative  hypothesis  for  the  data  K(<),<  =  1, . . .  ,n,  that  it  is  a  zero  mean  stationary 
time  series  with  covariance  function,  defined  for  v  =  0,  ±1,  ±2, . . . , 

R{v)  =  E[Y{t)nt  -  v)] 

and  spectral  density  function,  defined  for  0  <  u;  <  1, 

oo 

f(u,)=  Y, 

v=— OO 


The  sample  spectral  density  is  defined 


t=l 
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with  sample  distribution  fimction  (on  0  <  a;  <  1) 


F~M  =  r  fmi>‘ 


Normalized  versions  of  these  functions  are 

/-(a,)  =  /»/F-(l), 

Analogues  of  the  sample  change  density  and  sample  change  process  are 

c'(u})  =  -  1, 

(T(u))  =  F*^(w)  -  w. 

5.  Four  phases  of  change  analysis 

A  sample  change  process  C~(t),Q  <  t  <  1,  is  a  dynamic  statistic  (sample 
path  of  a  stochastic  process)  which  often  can  be  shown  to  satisfy  under  the  null 
hypothesis  of  “no  change”  the  null  hypothesis  Hq  :  C~(.)  is  a  Brownian  Bridge  (or 
a  related  hypothesis).  The  statistical  analysis  of  Cr(.)  has  four  phases: 

Phase  1:  Graphical  analysis]  is  the  plot  of  C~(t),  0  <  t  <  1,  oscillatory,  a 
deterministic  parabola,  other  patterns. 

Phase  2:  Non-linear  functionals.  One  tests  Hq  by  computing  the  values  of  test 
statistics  (whose  asymptotic  distributions  imder  Hq  can  be  deduced  from  the 
theory  of  empirical  processes) 


/‘  \C-{r)\‘‘dr, 

Jo 

f  (IC-(r)lVT(l  -  r))dr, 

max  |Cr(r)|/r(l -r). 
r=jln 


Phase  3:  Linear  functionals.  For  various  score  functions  K{t),  called  change 
score  functions,  one  computes  the  linear  functional  (or  component) 

Cr{K)=  A'(r)dCr(r)  =  f\{T)c{T)dT 

Jo  Jo 

One  can  often  write  approximately 

C-(K)  =  (l/n)j2  mi  -  •5)/n)c-(0'  -  -S)/") 

i=i 

The  score  function  is  usually  chosen  as  a  sequence  of  Orthonormal  functions 
V’i(.),  ‘  ’  especially  the  Legendre  polynomials,  which  test  against  patterns  in 

the  change  density  c~(t). 
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The  key  to  change  analysis  is  to  choose  transformations  of  data  (score  the  data) 
which  are  most  powerful  for  detecting  change.  From  the  sample  change  processes, 
suitable  linear  fimctionals  (score  the  change)  are  formed.  These  linear  functionals 
are  called  “double  score  components”.  One  can  define  bivariate  density  fimctions 
dfr, u),  0  <  r  <  1,0  <  u  <  1,  of  which  double  score  functions  are  iagnostics. 
Choice  of  data  score  fimctions  are  motivated  in  sections  6  and  7  parametrically  and 
non-parametricedly,  respectively. 

Phase  4:  Density  estimation.  By  one  of  the  many  methods  available  in  the  the¬ 
ory  of  curve  smoothing  (kernel  methods,  splines,  exponential  methods,  wavelets, 

etc.)  form  a  smooth  estimator  c(r)  of  the  change  density. 

An  exposition  of  the  theory  of  these  phases  would  require  a  book  and  is  beyond 
the  scope  of  this  paper.  Our  goal  in  this  paper  is  to  outline  the  phases  and  to 
explain  how  we  choose  transformations  of  the  original  data  from  which  to  form  a 
change  process. 

6.  Parametric  scores  change  analysis 

To  detect  change  over  time  in  a  sequence  one  must  have  some  prior  opinion 
about  the  ways  in  which  the  probability  distribution  of  the  observations  may  be 
changing  (sudi  as  in  location,  scale,  skewness,  etc).  Sample  change  processes  are 
formed  for  transformed  data,  where  the  transformation  is  called  intuitively  a  data 
score  function.  The  most  powerful  data  transformations  are  essentially  the  suflBcient 
statistics,  or  more  precisely  the  Fisher  score  functions,  when  one  has  a  parametric 
model  /(y;  B)  for  a  random  sample  Y{t),t  =  1, . . .  ,n,  where  6  =  (^i, . . . , $i^). 

The  maximum  likelihood  estimator  6'  is  obtained  by  maximizing  the  average 
log-likelihood 

t=l 

Define  score  functions 

SjiY-,e)  =  d/dejiogf{Y-,0) 

The  maximum  likelihood  estimator  is  the  solution  of  the  estimating  equations  for 

;  =  1,  ...,fc 

t=l 

Our  approach  to  change  analysis  asks  if  for  every  potential  changepoint  r  = 
m/n  the  parametric  model  with  6  =  9'  fits  the  data  Y{t),  <  =  1, . . . ,  m,  up  to  the 
time  m  in  the  sense  that  approximately 

m 

(l/n)Y,Si(Y{t)-,n  =  0- 

t=l 

We  define  the  score  change  process  to  linearly  interpolate  its  values  at  r  =  m/n, 
for  m  =  1, . . . ,  n 

C-(r;S,)=(l/n)f;5;(F(<);r) 

«=1 

where  _ 

S;(K;r).  =  5,(K;«-)/£/l5j(K;r)). 
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We  form  k  score  change  processes,  for  j  =  1, . . . ,  fc. 

We  call  this  approach  “random  walk  your  normalized  scores.”  We  are  develop¬ 
ing  the  probability  theory  of  the  score  change  processes. 

These  theoretical  concepts  can  best  be  understood  through  examples.  Consider 
a  gamma  distribution  model 

/(y;  exp{-y/e) 

where  d  is  a  positive  scale  parameter,  assumed  unknown,  and  u  is  a.  positive  shape 
parameter,  assmed  known.  One  can  show  that  the  score  function  of  the  parameter 

S(Y-,6)  =  {l/B)aY/e)-uy, 

the  maximiim  likelihood  estimator  is 

r  =  Y-/u; 

the  normalized  score  function  evaluated  at  the  maximum  likelihood  estimator  of 
the  parameter  may  be  shown  to  be 

To  test  the  observations  Y(.)  for  change,  one  forms  the  maximum  likelihood 
score  change  process  C"(r;5*),  0  <  r  <  1,  and  tests  if  this  dynamic  statistic  is 
significantly  different  from  a  sample  path  of  a  Brownian  Bridge  stochastic  process. 
A  linear  functional  of  the  change  process  corresponding  to  the  score  function 

K(t)  =  12-®(r  -  .5) 


is 

C-(K,S')  =  (l/n)f^(12:.)-5((r(()/n  -  l)((i  -  .6)/n) 

t=l 

=  (12.-)-5(l/n)  f^r(i)((t  -  .5)/n)/Y- 
<=1 

Under  the  null  hypothesis  of  no  change  the  asymptotic  distribution  of  n‘^C~{K,  S*) 
is  Norm2d(0,l). 

An  example  of  an  application  of  this  statistic  is  in  Hsu  (1979)  where  it  is 
presented  as  a  test  designed  for  a  small  change  in  the  scale  parameter  ^  of  an 
independent  Gamma  distributed  sequence,  derived  by  Kander  and  Zacks  (1966)  by  a 
Bayesian  aneilysis  assuming  the  changepoint  r  is  uniformly  distributed  in  time.  This 
test  statistic  is  derived  in  our  approa^  eis  ansJogous  to  a  component  in  standeu'd 
goodness  of  fit  analysis. 

7.  Nonparametric  scores  change  analysis 

Our  approach  to  change  analysis  recommends  that  one  compute  and  interpret 
several  change  processes  formed  from  several  transformations  of  the  original  data. 
In  addition  to  (or  instead  of)  various  parametric  score  change  processes,  one  can 
define  various  nonparametric  score  processes  for  a  data  sequence  y’(t),  t  =  1, . . .  ,n. 
Define: 
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sample  distribution  function  F~(y); 

sample  probability  mass  function  p~{y)  =fraction  of  sample  equal  to  y; 

mid-distribution  function  P~(y)  =  F~{y)  —  -^P  iy)- 

The  mid-rank  data  trzmsformation  forms  P~{Y(t)),  t  =  1, . . . ,  n.  When  ^  Y 
values  are  distinct,  P~{Y{t))  =  (Rank(y(t))  —  .5)/n;  we  recommend  this  definition 
of  mid-ranks  over  the  most  used  definition  Rank  (y(t))/(n  -I- 1). 

One  chooses  a  data  score  function  J(«),  0  <  u  <  1,  suitable  for  testing  non- 
parametrically  various  types  of  changes  in  the  distribution  of  the  data  (especially 
changes  in  location  or  scale  parameters).  A  typical  dioice  for  J(u)  is  a  Legendre 
polynomial  normalized  to  satisfy 


Apply  the  four  phzises  of  change  analysis  to  the  tramsformed  data  sequence 
J{P~{Y{t)).  In  the  third  phase  one  examines  and  interprets  hnear  fimctional  tests 
for  change  of  the  form 


C-{K,  J)  =  (1/n)  X;  iir((<  -  .i)/n)J(p-(Y(t)) 
«=1 


for  siiitable  change  score  functions  K{t).  One  can  usually  show  that  under  the  null 

hypothesis  of  no  change  the  asymptotic  distribution  of  n-^CiK,  J)  is  Normal(0,l). 
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